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Final  Report: 

This  is  the  final  report  for  the  grant  F49620-96- 1-0472  which  ended  on  Dec.  31,  2000.  During 
the  period  from  September  1996  through  December  2000,  the  main  objectives  were  to: 


1.  Build  a  research  infrastructure  (equipment,  faculty,  students) 

2.  Develop  faculty  research  opportunities 

3.  Develop  a  graduate  student  research  program  within  the  Computer  Science  and  Electrical 
Engineering  Departments 

Summary: 

The  department  of  Computer  Science  was  positively  impacted  by  this  funding,  as  it  helped  to 
insure  research  opportunities  for  both  students  and  faculty  at  both  the  Master’s  and  Ph.D. 
degree  program  levels.  The  Computer  Science  Ph.D.  program  was  initiated  in  October  1995  so 
that  this  grant  provided  substantial  opportunities  in  attracting  quality  students  into  the  new 
program.  Over  this  same  period,  the  Electrical  Engineering  department  was  similarly  influenced 
by  the  funding  though  its  program  offered  a  Master’s  degree  as  its  only  graduate  level  program. 
However,  the  support  that  the  grant  did  provide  its  faculty,  can  be  credited  in  part  for  the 
approval  and  implementation  of  their  Ph.D.  program  in  October  2002.  The  list  of  publications, 
listed  below,  attests  to  the  measurable  efforts  in  training  and  supervision  of  both  Master’s  and 
Ph.D.  students  in  state-of-the-art  research  areas. 

The  UTSA  faculty  supported  under  this  grant  also  benefitted  in  several  ways;  such  as,  funding 
support  for  graduate  students  to  work  on  specific  areas  of  faculty  research,  opportunities  to 


organize  interacting  research  teams,  initiating  new  research  areas,  preparation  time  for  other 
funding  opportunities,  demanding  higher  research  standards  among  students  in  EE  and  CS, 
establishing  a  university  approved  research  center  as  a  focal  point  for  the  faculty’s  research,  and 
supporting  equipment  purchases  for  two  laboratories  that  were  extensively  used  in  this  grant. 

Approach 

1.  Developed  a  focused  research  program  in  telecommunications,  networks,  and  network  based 
computing 

2.  Establish  a  computational  research  facility 

3.  Seek  local  industry  support  for  research 

4.  Increase  recruitment  of  Ph.D.  and  MS  students 

Developing  research  focus  areas  to  complement  and  enhance  faculty  research  interests.  Recruit 
graduate  students  to  train  in  designated  research  areas.  Course  lectures  and  independent  studies. 
Establish  computational  facilities  to  support  research  efforts.  Collaborative  research  projects. 

Results 

1.  Two  National  Science  Foundation  Career  Awards 

2.  Growing  network  research  reputation  in  wireless  Sc  mobile  network  protocols. 

3.  The  Center  for  Advanced  Computing  and  Network  Research. 

4.  Expansion  of  faculty  and  student  research  interests  and  capabilities 

5.  New  labs  in  wireless  Sc  mobile  communications,  and  parallel  Sc  distributed  systems 

Strong  group  research  effort  in  wireless  and  mobile  network  protocols.  Research  spanning  the 
breadth  of  network  communications  and  network  based  computing.  University  approval  (based 
on  AFOSR  funding)  for  The  Center  for  Advanced  Computing  Sc  Network  Research  (July  1999). 
Established  a  Wireless  Sc  Mobile  Communications  Laboratory,  a  Parallel  Sc  Distributed  Systems 
Laboratory  networked  with  an  ATM  switch  and  a  10-  and  100-bit  ethernet  interconnections  and 
the  High-speed  Gigabit  Communications  Laboratory. 

Deliverables 

1.  Performance  based  simulators:  networks;  memory  hierarchy;  routing;  switches. 

2.  Analytical  and  experimental  results. 


2 


3.  Public  release  of  routing  protocols  for  wireless  &  mobile  networks. 

4.  Advance  research  capabilities  for  future  projects  with  AFOSR. 

5.  Research  trained  student  graduates. 

6.  Growing  network  research  reputation  in  wireless  &  mobile  network  protocols. 

7.  New  labs  in  wireless  &  mobile  communications,  parallel  &  distributed  systems 

An  enumeration  of  the  research  efforts  collectively  funded  the  above  URISP  award  is  provided 
below.  The  research  areas  are  classified  under  two  technical  clusters:  (i)  Communications  & 
Networks,  and  (ii)  Parallel  &  Distributed  Computing.  Key  aspects  reported  on  include:  (1) 
Wireless  networks;  (2)  Transmission  rate  controller  in  ATM  Networks;  (3)  Fast  parallel  and 
distributed  algorithms;  (4)  Memory  Access  Mechanism;  (5)  Metacomputing;  (6)  Optimal  resource 
distribution  in  multiprocessor  networks;  (7)  Compilation  Techniques  for  Parallel  and  Distributed 
Systems. 

The  AFOSR  (URISP)Grant  F49620-96-1-0472  was  a  major  reason  for  the  establishment  of  the 
Center  for  Advanced  Computing  and  Network  Research.  The  Center’s  activities  encompasses 
research,  the  support  for  graduate  and  undergraduate  student  research,  community  out  reach 
programs  and  alliances  with  local  information  technology  companies.  The  Director  of  the  Center 
was  Prof.  Robert  Hiromoto  who  is  the  PI  for  this  grant,  and  is  now  the  Chair  of  the  Department 
of  Computer  Science  at  the  University  of  Idaho. 

Cluster  I:  COMMUNICATIONS  &  NETWORKS 

1  Mobile  and  Wireless  Communications 

1.1  Objectives 

During  the  period  of  this  grant,  we  established  several  unified  areas  of  research.  These  areas  are 
categorized  as  1)  Protocols  for  Mobile,  Wireless  Networks,  2)  Routing  Techniques  for  Wireless 
Ad-Hoc  Communication  Networks  and  3)  Formal  Analytical  Models  for  Mobile  and  Wireless 
Networks. 

Digital  Wireless  Networks 
Objective 

The  design  and  analysis  of  low  complexity  multi-user  (MU)  detectors  for  future  wireless  systems 
employing  code-division  multiple  access  (CDMA).  Desirable  transceivers  must  meet  a  complexity- 
performance  trade-off  in  the  presence  of  impairments  such  as  unknown,  time-varying  (fading) 
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channels  and  user  motion.  Hence,  DSP-implementable  receiver  structures  for  joint  equaliza¬ 
tion/diversity  and  channel  estimation  are  investigated  with  a  focus  on  performance  evaluation  in 
the  presence  of  realistic  worst  case  channel  (large  delay  and  Doppler  spreads)  and  MAI  conditions 
(signal  to  interference  ratios  of  -10  dB). 

Approach 

Extract  signature  information  from  CDMA  modulation  for  user  demodulation/decoding.  Analyze 
multi-packet  reception  capabilities  for  all-digital  wireless  transceivers  based  on  minimum  mean- 
squared  error  principle.  Determine  performance  in  terms  of  multi-path  channel  spread,  near-far 
ratio  for  multi-access  interference  and  detector  observation  window 

To  counteract  typical  fading  wireless  channels  and  multiple  access  interfemce  (MAI)  domi¬ 
nated  scenarios,  the  transceiver  architecture  must  not  only  be  efficient,  but  necessarily  adaptive. 
Broadly  speaking,  a  space-time  receiver  architecture  is  required  that  exploits  in  the  most  effective 
manner  all  available  degrees  of  freedom  (both  temporal  and  spatial).  Accordingly,  an  all-digital 
receiver  is  desired  that  performs  the  major  sub-tasks  (diversity  combining,  equalization,  timing 
and  carrier  recovery)  jointly  in  a  data-adaptive  manner  to  achieve  signal  separation/multi-user 
interference  cancellation. 

Specifically,  linear  adaptive  detectors  were  studied  that  exploit  the  underlying  signal  subspace 
structure  resulting  from  CDMA  modulation  appear  attractive  in  counteracting  typical  fading 
wireless  channels  and  multiple  access  interfemce  (MAI)  dominated  scenarios.  The  benefits  of  a 
blind  adaptive  detector  that  assumes  no  information  regarding  the  interfering  user  parameters 
(timing,  code)  and  which  operates  only  in  the  signal  space  (i.e.,  the  space  occupied  by  all  ac¬ 
tive  users)  are  highlighted  for  the  AWGN-only  channel  with  all  synchronous  users.  The  detector 
obtains  an  estimate  of  the  signal  subspace  via  a  one-shot  eigen-decomposition  of  the  received 
data.  However,  the  presence  of  either  (interfering)  user  asynchronism  or  multipath  leads  to  sig¬ 
nificant  performance  degradations  to  the  above  approach.  Further,  the  detector  is  known  to  be 
sensitive  to  imperfections  in  timing/carrier  recovery  for  the  desired  user.  To  render  subspace 
detectors  useful  for  fading  multipath  channels  (as  opposed  to  AWGN)  and  robust  to  model  im¬ 
perfections  such  as  timing/carrier  offset,  joint  signal  subspace  tracking  and  adaptive  interference 
cancellation/timing  and  carrier  estimation  is  necessary  -  a  comparative  performance  analysis  of 
some  candidate  approaches  were  preformed.  Continuing  analytical  and  simulation  experiments 
are  required  to  quantify  performance  in  terms  of  multipath  channel  spread  in  time  and  Doppler, 
near-fax  ratio  for  the  MAI  and  detector  observation  window  prior  to  a  recommendation  for  DSP- 
transceiver  architecture. 

Results 

Simulated  slow/moderate  fading  channels  performs  better  than  other  candidates;  fast  fading 
channels  need  further  improvement.  Slotted  Aloha  with  MMSE  detector  is  nearly  optimal.  Heavy 


load  response  needs  further  improvement. 
Deliverables 
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A  DSP-implementable  digit  wireless  transceiver  architecture  for  fading  channels;  Design  of  a 
packet  CDMA  protocol  exploiting  link-level  advances  (multi-user  detection). 

Protocols  for  Mobile,  Wireless  Networks 

The  goal  of  the  “Protocols  for  Mobile,  Wireless  Networks”  project  is  to  study  protocols  for 
mobile,  ad-hoc  networks.  Dynamic  routing  being  the  key  research  challenge  in  ad-hoc  networks, 
we  emphasize  on  routing  protocols.  However,  our  goal  also  includes  study  of  medium-access 
control  or  MAC  protocols  for  ad-hoc  networks.  The  traditional  layered  structure  of  network 
protocols  is  considered  less  desirable  for  wireless  networks  because  of  the  highly  unreliable  and 
time-varying  characteristics  of  radio  channels.  Achieving  acceptable  performance  in  wireless 
networks  requires  addressing  of  technical  issues  simultaneously  at  different  layers.  Thus,  one 
major  research  emphasis  was  the  unique  characteristics  of  the  radio  link  layer  that  affects  the 
performance  of  higher  networking  layers  and,  secondly,  how  key  link  layer  specific  information 
can  be  exploited  by  the  higher  layers  for  better  performance.  As  a  part  of  this  project  we  are 
also  investigating  fast  simulation  methodologies  for  efficient  performance  evaluation  of  wireless 
network  protocols. 

Routing  Techniques  for  Wireless  Ad-Hoc  Communication  Networks 

This  project  developed  routing  algorithms  and  analyzes  switch  designs  to  provide  fast  and  efficient 
communication  in  wireless,  mobile  and  ad-hoc  networks.  Wireless  networks  consisting  of  only 
low-powered,  mobile  communicators  pose  special  challenges  in  routing  packets  between  non- 
adjacent  nodes.  The  goal  is  to  design  routing  methods  that  sustain  performance  under  overloading 
conditions  and  have  low  overhead  and  energy  consumption  characteristics.  The  application  of 
directional  antennas  were  also  studied  with  efficient  protocols  proposed. 

Formal  Analytical  Models  for  Mobile  and  Wireless  Networks 

The  goal  of  this  project  is  to  develop  formal  analytical  models  to  predict  the  performance  of 
mobile  and  wireless  communication  networks  from  both  a  theroetical  and  practical  aspect.  The 
performance  of  a  particular  mobile  and  wireless  protocol  based  on  imperical  studies  through  the 
use  of  simulations  alone  is  an  impercise  and  potentially  inaccurate  approach.  Without  a  sound 
theoretical  frame  work,  validations  between  the  simulated  protocol  and  the  extension  to  actual 
practice  is  tenuous  at  best. 
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1.2  Status  of  Effort 


Simulation  model  building:  We  have  developed  simulators  for  routing  and  MAC  protocols 
for  ad-hoc  networks.  These  include  the  following: 

(i)  Augmenting  MaRS  (Maryland  Routing  Simulator)  for  simulating  ad-hoc  routing  protocols; 

(ii)  Developing  some  key  routing  protocols  in  MaRS;  (iii)  Using  popular  network  simulator  NS2  to 
develop  a  detailed  simulation  model  of  a  routing  protocol,  called  AODV;  (iv)  Developing  several 
CSMA-based  MAC  protocol  models  in  a  home-grown  event-  driven  simulator. 

Performance  evaluation  of  routing  protocols:  We  have  completed  a  thorough  performance 
evaluation  of  several  routing  protocols  for  ad-hoc  networks.  They  include  current-generation 
on-demand  protocols,  as  well  as,  more  traditional,  pro-active  protocols. 

Development  of  Internet  standards:  We  participated  actively  in  developing  the  Internet 
draft  for  the  AODV  protocol. 

Optimization  of  routing  protocols: 

(i)  Query  localization:  we  developed  a  mechanism  to  localize  the  extent  of  broadcast  floods  in 
on-demand  routing  protocols,  (ii)  Multipath  techniques:  we  developed  a  multipath  technique  to 
utilize  redundant  routes  in  on-demand  protocols.  This  reduces  the  frequency  of  flooding,  (iii) 
Several  other  optimizations  to  reduce  the  impact  of  flooding  in  under  study. 

MAC  protocols:  We  developed  a  multichannel  CSMA-based  MAC  protocol  for  wireless,  mul¬ 
tihop  networks.  It  uses  a  form  of  “soft”  channel  reservation  to  reduce  packet  collisions. 

Ad-Hoc  testbed  development:  A  testbed  involving  several  Linux  laptops  and  Lucent  Wavelan 
wireless  interface  is  under  development.  The  testbed  will  implement  the  AODV  protocol  as  an 
extension  of  the  ARP.  The  testbed  will  be  used  for  experimental  performance  evaluation  of  the 
AODV  protocol  with  realistic  networking  applications. 

Parallel  simulation:  We  are  developing  parallel  simulation  technology  to  support  fast  simular 
tion  of  large-scale  wireless  networks.  Certain  performance  optimization  techniques  for  a  rollback- 
based  parallel  simulation  mechanism,  called  Time  Warp,  have  been  developed.  Currently,  paral¬ 
lelization  of  a  public  domain  network  simulator  (NS2)  on  workstation  clusters  is  underway. 

We  have  investigated  network  layer  routing  algorithms  for  wireless,  mobile  and  ad-hoc  networks 
(MANETs).  We  have  conducted  extensive  performance  analyses  of  various  existing  algorithms 
and  designed  variations  of  these  for  better  performance.  We  have  also  looked  at  multicast  com¬ 
munication  on  MANETs.  Currently,  we  are  designing  newer  algorithms  that  facilitate  quality  of 
services  (QoS)  support  and  implementation  of  differentiated  services  framework. 
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1.3  Status  of  Accomplishments 

Our  simulations  of  newly  proposed  wireless  routing  protocols,  showed  significant  performance 
benefits  in  comparison  with  traditional  techniques. 

We  have  conducted  extensive  analysis  of  routing  techniques  for  MANETs  and  exposed  several 
wealmesses  in  two  routing  techniques  called  AODV  and  DSR.  We  designed  a  new  routing  tech¬ 
nique  that  provides  good  performance  under  a  variety  of  traffic  loads  and  sustains  its  performance 
when  the  network  is  overloaded.  We  have  shown  that  this  algorithm  has  low  overhead  which 
might  make  them  suitable  for  implementation  in  low-powered  communication  devices.  Some  of 
the  results  will  be  presented  in  an  upcoming  conference. 

We  have  analyzed  multicast  group  communication  in  MANETs.  We  have  shown  that  in  crowded 
MANETs,  where  each  node  has  several  neighbors  sharing  a  common  radio  channel,  tree  based 
multicasts  perform  better  than  mesh  based  multicasts.  Recently  we  improved  on  a  mesh-based 
multicast  technique  to  provide  100%  improvement  in  throughput  in  dense  MANETs. 

Several  analytical  model  has  been  designed  for  analyzing  the  performance  of  the  Query  Localiza¬ 
tion  (QL)  techniques  proposed  by  Castaneda  and  Das.  The  model  also  points  out  the  importance 
of  boundary  effects  on  the  performance  of  the  QL  technique. 

Our  simulations  of  newly  proposed  wireless  routing  protocols,  showed  significant  performance 
benefits  in  comparison  with  traditional  techniques. 

We  have  conducted  extensive  analysis  of  routing  techniques  for  MANETs  and  exposed  several 
weaknesses  in  two  routing  techniques  called  AODV  and  DSR.  We  designed  a  new  routing  tech¬ 
nique  that  provides  good  performance  under  a  variety  of  traffic  loads  and  sustains  its  performance 
when  the  network  is  overloaded.  We  have  shown  that  this  algorithm  has  low  overhead  which 
might  make  them  suitable  for  implementation  in  low-powered  communication  devices.  Some  of 
the  results  will  be  presented  in  an  upcoming  conference. 

1.4  Personnel  Supported 

Rajendra  V  Boppana  -  Faculty  investigator 

Samir  Das  -  Faculty  investigator 

Robot  Hiromoto  -  Faculty  investigator 

Asis  Nasipuri  -  Postdoc  fellow 

Saman  Desilva  -  Network  engineer  and  PhD  student 

Robert  Castaneda  -  PhD  student 

Suvro  Ghosh  -  MS  student 

Ramakanth  Gunuganti  -  PhD  student 
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Kevin  Jones  -  PhD  student 
Satyadeva  P  Konduru  -  PhD  student 
Mahesh  Marina  -  PhD  student 

Sumit  Roy  -  Subcontractor  (University  of  Washington) 

Veronica  Schiaffini  -  MS  student  (graduated) 

Shanmuka  Voona  -  PhD  student 
Jiangtao  Yan  -  MS  student  (graduated) 

Shengchun  Ye  -  MS  student 

Jun  Zhuang  -  MS  student  (graduated) 

Note  that  some  of  the  above  personnel  were  only  partially  supported  by  the  project  funds. 


1.5  Technical  Publications 

Journals  - 

Published  or  accepted  for  publication  in  refereed  journals  or  conference  proceedings: 

1.  Supavadee  Aramvith,  Chia-Wen  Lin,  Sumit  Roy,  and  Ming-Ting  Sim,  “Wireless  video  trans¬ 
port  using  conditional  retransmission  and  low-delay  interleaving,”  submitted  to  IEEE  Trans. 
Circuits  and  Systems  for  Video  Technology,  Special  Issue  on  Wireless  Video  (submitted, 
Apr.  2001). 

2.  R.  V.  Boppana  and  S.  Chalasani,  “Fault-Tolerant  Communication  with  Partitioned  Dimension- 
Order  Routers,”  to  appear  in  IEEE  Transactions  on  Parallel  and  Distributed  Systems, 
Special  Issue  on  Fault-Tolerant  Routing. 

Conference  Proceedings/Presentations 

1.  Supavadee  Aramvith,  ChiarWen  Lin,  Sumit  Roy,  and  Ming-Ting  Sun,  “Wireless  video  trans¬ 
port  using  conditional  retransmission  and  low-delay  interleaving,”  Proc.  IEEE  Int.  Symp. 
on  Circuits  and  System,  V-101  104,  Jun.  2001,  Sydney,  Australia. 

2.  H.  Yan  and  S.  Roy,  “A  FREQUENCY  DOMAIN  METHOD  FOR  CHANNEL  ESTIMA¬ 
TION  IN  MULTIRATE  COMMUNICATION  SYSTEMS,”  IEEE  International  Conference 
on  Acoustics,  Speech,  and  Signal  Processing  Salt  Lake  City,  Utah,  May  7-11,  2001,  vol.  4, 
pp.  2057-2060. 

3.  H.  Yan  and  S.  Roy,  “FIR  CHANNEL  IDENTIFICATION  IN  MULTIRATE  COMMUNI¬ 
CATION  SYSTEMS  WITH  A  SUBSPACE  METHOD,”  IEEE  International  Conference  on 
Acoustics,  Speech,  and  Signal  Processing  Salt  Lake  City,  Utah,  May  2001. 
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4.  K.  Jones  and  S.  R.  Das,  Time-Parallel  Algorithms  for  Simulation  of  Multiple  Access  Pro¬ 
tocols,  in  Proceedings  of  the  9th.  International  Symposium  on  Modeling,  Analysis  and 
Simulation  of  Computer  and  Telecommunication  Systems  (MASCOTS  2001),  Cincinnati, 
Ohio,  August  2001. 

5.  N.  Jain,  S.  R.  Das  and  A.  Nasipuri,  “A  Multichannel  MAC  Protocol  with  Receiver-Based 
Channel  Selection  for  Multihop  Wireless  Networks,”  Proceedings  of  the  9th  Int.  Conf.  on 
Computer  Communications  and  Networks  (IC3N),  Phoenix,  Oct  2001. 

6.  S.  R.  Das,  C.  E.  Perkins,  E.  M.  Royer  and  M.  K.  Marina,  “Performance  Comparison  of 
Two  On-demand  Routing  Protocols  for  Ad  Hoc  Networks,”  IEEE  Personal  Communications 
Magazine,  special  issue  on  Mobile  Ad  Hoc  Networks,  Vol.  8,  No.  1,  Feb  2001,  pages  16-29. 

7.  A.  Nasipuri,  R.  Castaneda  and  S.  R.  Das,  “Performance  of  Multipath  Routing  for  On- 
Demand  Protocols  in  Ad  Hoc  Networks,”  ACM/Kluwer  Mobile  Networks  and  Applications 
(MONET)  Journal,  Vol.  6,  No.  4,  2001,  pages  339-349. 

8.  M.  K.  Marina  and  S.  R.  Das,  “Performance  of  Route  Caching  Strategies  in  Dynamic  Source 
Routing,”  Proceedings  of  the  2nd  Wireless  Networking  and  Mobile  Computing  (WNMC), 
Phoenix,  April  2001.  In  conjunction  with  the  Int’l  Conference  on  Distributed  Computing 
Systems  (ICDCS)  2001. 

9.  M.  K.  Marina  and  S.  R.  Das,  On-demand  Multipath  Distance  Vector  Routing  for  Ad  Hoc 
Networks,  in  Proceedings  of  the  International  Conference  for  Network  Procotols  (ICNP)  , 
Riverside,  Nov.  2001. 

10.  S.R.  Voona,  R.  Gunuganti,  and  R.V.  Boppana,  “On  the  Performance  of  Tree  and  Mesh 
Based  Multicast  Routing  in  Mobile  and  Ad-Hoc  Networks,”  Submitted  to  IEEE  Infocom, 
April  2000. 

11.  A.  Abouzeid,  S.  Roy  and  M.  Azizoglu,  “Stochastic  modeling  of  TCP  over  lossy  links,”  in 
Proc.  INFOCOM’2000,  2000,  pp.  1724-1733. 

12.  S.  R.  Das,  R.  Castaneda  and  J.  Yan,  “Simulation  Based  Performance  Evaluation  of  Mobile, 
Ad  Hoc  Network  Routing  Protocols,”  ACM/Baltzer  Mobile  Networks  and  Applications 
(MONET)  Journal,  July  2000,  pages  179-189. 

13.  A.  Nasipuri  and  S.  R.  Das,  “Multichannel  CSMA  with  Signal  Power-Based  Channel  Se¬ 
lection  for  MultihopWireless  Networks,”  Proceedings  of  the  IEEE  Vehicular  Technology 
Conference  (VTC),  Boston,  Sept  2000. 

14.  S.  Desilva  and  S.R.  Das,  “Experimental  Evaluation  of  a  Wireless  Ad  Hoc  Network,”  Pro¬ 
ceedings  of  the  9th  Int.  Conf.  on  Computer  Communications  and  Networks  (IC3N),  Las 
Vegas,  October  2000.  _ 
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15.  R.V.  Boppana  and  M.K.  Marina,  “An  Adaptive  Routing  Algorithm  for  Mobile  and  Ad-Hoc 
Networks,”  Submitted  to  IEEE  Infocom,  April  2000. 

16.  S.R.  Das,  C.  E.  Perkins  and  E.  M.  Royer,  “Performance  Comparison  of  Two  On-demand 
Routing  Protocols  for  Ad  Hoc  Networks,”  Proceedings  of  INFOCOM  2000  Conference,  Tel- 
Aviv,  Israel,  March  2000. 

17.  R.V.  Boppana  and  C.S.  Raghavendra,  “Designing  Efficient  Benes  and  Banyan  Based  Input- 
Buffered  ATM  Switches,”  International  Conference  on  Communications  (ICC),  June  1999. 

18.  R.  V.  Boppana,  M.  K.  Marina,  S.  P.  Konduru,  “An  Analysis  of  Routing  Techniques  for 
Mobile  and  Ad-Hoc  Networks,”  International  Conference  on  High  Performance  Computing, 
December  1999. 

19.  S.  R.  Das,  R.  Castaneda  and  J.  Yan,  “Simulation  Based  Performance  Evaluation  of  Mobile, 
Ad-Hoc  Network  Routing  Protocols,”  accepted  for  publication  in  ACM/Baltzer  Wireless 
Network  Journal,  1999. 

20.  A.  Nasipuri  and  S.  R.  Das,  “On-demand  Multipath  Routing  for  Mobile  Ad-Hoc  Networks,” 
to  appear  in  the  Proceedings  of  the  8th.  IEEE  International  Conference  on  Computer 
Communications  and  Networks  (IC3N),  Boston,  Oct.  1999. 

21.  A.  Nasipuri,  J.  Zhuang  and  S.  R.  Das,  “A  Multichannel  CSMA  MAC  Protocol  for  Multihop 
Wireless  Networks,”  to  appear  in  the  Proceedings  of  the  IEEE  Wireless  Communications 
and  Networking  Conference  (WCNC),  New  Orleans,  September,  1999. 

22.  R.  Castaneda  and  S.  R.  Das,  “Query  Localization  Techniques  for  On-Demand  Routing 
Protocols  for  Mobile  Ad-Hoc  Networks,”  Proceedings  of  the  Fifth  ACM  International  Con¬ 
ference  on  Mobile  Computing  and  Networking  (MOBICOM),  Seattle,  August,  1999. 

23.  K.  Jones  and  S.  R.  Das,  “Combining  Optimism  Limiting  Schemes  in  Time  Warp  Based 
Parallel  Simulations,”  Proceedings  of  the  1998  Winter  Simulation  Conference,  Washington, 
DC,  Dec.  1998,  pages  499-505. 

24.  S.  Desilva  and  S.  R.  Das,  “Experimental  Evaluation  of  Channel  State  Dependent  Scheduling 
in  an  In-building  Wireless  LAN,”  Proceedings  of  the  7th.  IEEE  International  Conference  on 
Computer  Communications  and  Networks  (IC3N),  Lafayette,  LA,  Oct  1998,  pages  414-421. 

25.  S.  R.  Das,  R.  Castaneda,  J.  Yan  and  R.  Sengupta,  “Comparative  Performance  Evaluation 
of  Routing  Protocols  for  Mobile,  Ad-Hoc  Networks,”  Proceedings  of  the  7th.  IEEE  Inter¬ 
national  Conference  on  Computer  Communications  and  Networks  (IC3N),  Lafayette,  LA, 
Oct  1998,  Lafayette,  LA,  pages  153-161. 

26.  R.  Castaneda,  S.  R.  Das  and  M.  Marina,  “Query  Localization  Techniques  for  On-Demand 
Routing  Protocols  for  Mobile  Ad-Hoc  Networks,  submitted  to  ACM/Baltzer  Wireless  Net¬ 
works  (WINET)  Journal  (invited  as  an  expanded  version  of  the  MOBICOM’99  paper). 


1.6  Transitions /Interactions 


(* 


Software  in  public  domain:  We  are  releasing  a  simulation  model  of  the  AODV  protocol  as  a  part  of 
the  public  domain  network  simulator  NS2.  NS2  being  a  widely  used  simulator  for  internetworking 
protocols,  our  work  will  get  very  good  visibility. 

Industry  interactions:  The  development  and  evaluation  of  the  AODV  protocol  is  carried  out  in 
cooperation  with  the  Sun  Microsystems  Labs,  Palo  Alto,  and  Nokia  Research  Center,  Mountain 
View.  A  faculty  investigator,  Samir  Das,  spent  the  Spring’99  semester  in  Sim  Microsystems  as  a 
part  of  this  effort.  This  effort  resulted  in  the  submittal  of  a  Internet  Draft  of  the  specification  of 
the  AODV  protocol. 

Multicast  algorithms  implemented  in  the  widely  used  NS-2  network  simulator  are  bang  dis¬ 
tributed  to  researchers  for  their  use  and  comments.  We  are  currently  interacting  with  researchers 
at  UCLA  (SJ.  Lee),  USC/ISI  (K.  Obraczik),  and  Sun  Microsystems  (C.  Perkins). 

1.7  Patent  Disclosures  -  None. 

1.8  Honors/ A  wards 
Samir  Das: 

•  NSF  Faculty  Early  CAREER  Award,  1998-2003,  Parallel  Discrete  Event  Simulation:  Pro¬ 
tocols,  Tools  and  Applications. 

•  NSF  Networking  Research  Award,  1999-2003,  Collaborative  Proposal:  Protocols  for  Mobile 
Ad  Hoc  Networking  (with  Co-PI  Asis  Nasipuri,  in  collaboration  with  Nitin  Vaidya). 


2  Transmission  Rate  Controller  for  Congestion  Control  in  ATM 
Networks 

2.1  Objectives 

This  research  concerned  itself  with  the  performance  improvements  of  a  traffic  controller  that 
simultaneously  manages  congestion  control  and  connection  admission  control  for  asynchronous 
transfer  mode  (ATM)  network.  Asynchronous  transfer  mode  (ATM)  is  a  key  technology  for 
integrating  multimedia  services  in  high  speed  networks.  ATM  network  supporting  multimedia 
services  have  to  be  capable  of  handling  bursty  traffic  and  satisfying  various  quality  of  service 
(QOS)  and  bandwidth  requirements  and  also  to  achieve  high  system  utilization.  The  ATM  forum 
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has  specified  several  service  categories  in  relation  to  traffic  management  in  an  ATM  network. 
These  service  categories  are  divided  into  constant  bit-rate  (CBR),  variable  bit-rate  (VBR),  avail¬ 
able  bit-rate  (ABR)  and  unspecified  bit-rate  (UBR).  For  the  purposes  of  this  study,  the  traffic 
(service)  is  categorized  into  two  types:  real-time  (type  1)  and  non-real  time  (type  2).  Video 
and  voice  services  are  examples  of  type  1  traffic  and  data  services  are  examples  of  type  2  traffic. 
Because  of  the  unpredictable  statistical  fluctuations  in  the  traffic  flows  of  multimedia  services, 
network  congestion  may  still  occur  even  though  an  appropriate  connection  admission  control 
scheme  is  provided.  In  order  to  prevent  the  quality  of  service  (QoS)  from  severely  degrading 
during  short-term  congestion,  an  appropriate  congestion  control  must  be  provided. 

In  this  study  we  have  developed  two  control  techniques  for  congestion  control  in  ATM  networks 
for  ABR  traffic. 

The  fuzzy  approach  exhibits  a  soft  behavior  with  an  ability  to  adapt  itself  to  dynamic,  imprecise, 
and  bursty  environments  of  an  ATM  network  which  requires  a  sophisticated  real-time  traffic  con¬ 
troller  that  manages  connection  admission  control  and  congestion  control,  to  guarantee  the  QOS 
for  existing  calls  and  to  achieve  high  system  utilization.  All  congestion  control  and  connection 
admission  control  schemes  that  utilize  either  buffer  thresholds  or  capacity  estimation  have  limi¬ 
tations  due  to  incomplete  statistics  of  input  traffic.  Fuzzy  logic  systems  have  been  developed  to 
overcome  some  of  these  limitations. 

In  this  study,  we  added  a  transmission  rate  manager  for  type  2  traffic  of  the  Fuzzy  Traffic 
Controller,  and  improved  congestion  controller  design  to  achieve  superior  system  utilization  and 
to  obtain  low  cell  loss  ratio.  The  congestion  controller  is  improved  through  the  modification  of 
its  rule  structure.  The  output  of  the  congestion  controller  is  used  to  design  a  transmission  rate 
manager  to  adjust  the  transmission  rate  of  data  sources  of  type  2  traffic  to  obtain  high  system 
utilization  while  lowering  cell  loss  ratio. 

2.2  Status  of  Effort 

The  input  traffic  coming  from  customers  is  categorized  into  two  types:  real-time  (type  1)  and 
non-real  time  (type  2).  The  network  system  supports  two  separate  finite  buffers  with  size  K1  for 
type  1  traffic  and  size  K2  for  type  2  traffic.  When  the  buffer  is  full,  incoming  cells  are  blocked  and 
lost.  The  system  reserves  a  portion  of  its  capacity  for  type  1  traffic  and  the  remaining  portion  to 
type  2  traffic.  When  there  is  unused  type  1  or  type  2  capacity,  it  is  used  for  other  type. 

The  performance  measures  estimator  measures  the  queue  length  q,  queue  length  change  rate  Dq 
and  cell  loss  probability  (ratio)  pi  for  type  1  and  type  2  traffic,  and  feeds  these  measures  to  fuzzy 
congestion  controller.  The  fuzzy  congestion  controller  generates  a  control  action  y  according  to  a 
set  of  input  linguistic  variables  of  q,  Dq  and  pi  and  a  set  of  built  in  fuzzy  control  rules.  A  negative 
value  of  y  denotes  that  the  system  has  a  certain  degree  of  congestion,  a  new  call  has  little  chance 
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of  being  accepted.  A  positive  value  of  y  indicates  that  the  system  is  free  of  congestion  to  a 
certain  degree,  new  calls  have  a  good  chance  of  entering  the  network,  and  existing  calls’  rates  can 
be  adjusted  to  achieve  higher  system  utilization.  This  report  presents  an  improved  congestion 
controller  and  a  transmission  rate  manager  for  type  2  traffic  to  achieve  overall  higher  system 
utilization.  The  fuzzy  admission  controller  is  same  as  in  previous  report.  A  call  accept/reject 
control  action  Z  is  determined  by  the  linguistic  variables  pi,  y,  and  available  capacity  Ca  of  the 
network  for  both  type  1  and  type  2.  The  fuzzy  traffic  controller  simultaneously  handles  congestion 
control  and  call  admission  control. 

Fuzzy  Congestion  Controller 

The  queue  length  q,  queue  length  change  rate  Dq,  and  cell  loss  ratio  pi  are  used  as  input  linguistic 
variables.  The  congestion  control  action  y  is  used  as  output  linguistic  variable. 

We  defined  the  term  sets  for  queue  length,  rate  of  change  of  queue  length  and  cell  loss  ratio. 
The  membership  functions  for  the  above  term  sets  are  developed.  The  rule  structure  for  the 
congestion  controller  is  obtained  by  utilizing  the  knowledge  available  in  the  literature  and  by 
extensive  simulation.  The  max-min  inference  method  for  the  inference  engine  and  Tsakamoto’s 
defuzzification  method  for  the  defuzzifier  are  applied  to  obtain  a  crisp  value  yO  of  the  control  action 
y.  Based  on  the  crisp  value  yO,  the  transmission  rate  manager,  described  in  the  next  section,  sends 
a  transmission  rate  control  command  to  adjust  the  incoming  type2  traffic  to  achieve  high  system 
utilization  while  improving  QOS  provisioning  through  reduction  of  cell  loss  ratio. 

Transmission  Rate  Manager 

A  negative  value  of  yO  indicates  the  system  has  a  certain  degree  of  congestion.  A  positive  value 
of  yO  indicates  that  the  system  is  free  of  congestion  to  a  certain  degree.  A  transmission  rate 
manager  is  designed  using  yO  as  input  to  adjust  the  transmission  rates  of  sources.  Thus,  the 
network  available  bandwidth  is  maximally  utilized  while  maintaining  the  quality  of  service. 

Simulation  Results 

The  buffer  sizes  for  type  land  type  2  traffic  are  assumed  as  100  cells  each.  Simulation  results 
showed  that  with  the  new  rule  structure  for  congestion  controller  alone,  the  cell  loss  probability 
is  reduced  from  2.9e-3  to  1.96e-3,  a  32.4  %  reduction.  The  total  system  utilization  has  increased 
slightly  (  5%).  With  the  addition  of  the  proposed  transmission  rate  manager  for  type  2  traffic, 
the  total  system  utilization  has  increased  from  69  %  to  98.5%  The  maximum  cell  loss  probability 
has  decreased  from  2.9e-3  to  1.85e-5. 
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2.3  Status  of  Accomplishments 


We  have  developed  a  Pole-Placement  and  an  Optimal  Control  methodologies  for  the  design  of 
an  ABR  congestion  controller  for  ATM  networks  using  closed  loop  rate  based  flow  control.  In 
particular,  these  methods  show  how  to  design  an  explicit  rate  controller.  Simulation  studies  have 
been  made  for  the  optimal  control  method  using  MATLAB  to  show  transient  and  steady  state 
performance.  For  appropriate  values  of  K,  simulations  have  shown  good  transient  and  steady 
state  behavior. 

In  addition  we  studied  methods  for  the  improvement  of  performance  of  a  fuzzy  traffic  controller 
that  simultaneously  manages  congestion  control  and  call  admission  control  for  asynchronous 
transfer  mode  (ATM)  networks  .  We  introduced  a  new  rule  set  for  the  congestion  control  part  of 
fuzzy  traffic  controller.  In  addition,  we  added  a  transmission  rate  manager  to  adjust  the  source 
transmission  rate  of  data  traffic.  The  transmission  rate  manager  reduces  source  rate  when  there 
is  certain  degree  of  congestion  and  increases  source  rate  when  the  system  is  free  of  congestion  to 
a  certain  degree.  As  a  result  of  these  improvements,  the  total  system  utilization  has  improved 
from  69%  to  98.5%  and  QOS  improved  by  the  reduction  of  the  cell  loss  probability  from  2.9e-3 
to  1.85e-5. 

2.4  Personnel  Supported 

G.V.S.Raju  -  Faculty  investigator 
G.Hernandez  -  Graduate  student 
Zhaohua  Qiu  -  Graduate  student 
Xin  Wang  -  Graduate  student 
S.Ye  -  Graduate  student 
Q.R.Zou  -  Graduate  student 


2.5  Technical  Publications 

1.  G.V.S.Raju,Z.Qiu  and  X.Wang,  “Transmission  Rate  Manager  for  Traffic  Control  in  ATM 
Networks,”  submitted  for  publication. 


Conferences  - 

1.  G.V.S.  Raju,  Z.Qou.and  X.Wang  Transmission  Rate  Manager  for  Fuzzy  Logic  Control  of 
ATM  networks,”  2000  IEEE  International  Conference  on  Systems,  Man,  and  Cybernetics, 
Nashville,  TN,  October,  2000. 


2.  G.V-S.Raju,  G.Hemandez  and  Q.R.Zou,  “Quality  of  Service  in  Ad  Hoc  Networks,”  IEEE 
WCNC  2000,  Chicago,  September  2000. 

3.  G.V.S.Raju,  “Techniques  for  Traffic  Control  in  ATM  networks,”  IEEE  International  Con¬ 
ference  on  Systems,  Man,  and  Cybernetics,  San  Diego,  CA,  October, 1998. 

2.6  Interactions  -  See  conferences  above. 

2.7  Transitions  -  None. 

2.8  Patent  Disclosures  -  None. 

2.9  Honors/Awards 
G.V.S.  Raju: 

•  IEEE  3rd  Millennium  Medal 

Cluster  II:  PARALLEL  &  DISTRIBUTED  COMPUTING 

3  Fast  Parallel  and  Distributed  Algorithms 

3.1  Objectives 

During  the  report  period,  our  efforts  were  to  continue  our  development  of  fast  parallel  and  dis¬ 
tributed  algorithms  for  the  coordination  of  parallel  tasks  on  a  distributed  system.  Computer 
resource  issues  such  as  memory  demands,  parallel  task  scheduling  and  network  communications 
protocol  dominated  our  interests.  Using  an  in  house  clusters  of  distributed  multiprocessors,”  fast- 
ethemet  and  high-speed  switching  networks  (such  as  ATM  and  Myrinet),  proved  an  affordable 
high-performance  computational  platform  to  test  and  evaluate  different  implementation  strate¬ 
gies. 

Three  basic  levels  of  parallel  algorithm  designs  are  analyzed.  Verification  of  both  the  theoretical 
computational  and  communication  complexities  are  two  standard  approaches.  For  a  cluster  of 
distributed  multiprocessor,  the  complexity  of  network  communications  is  complicated  by  the  inte¬ 
gration  across  different  platform  interfaces.  The  third  level  complexify  arises  from  the  particular 
choice  of  network  hierarchy  employed  and  the  particular  overhead  costs  that  arises  from  its  use. 

An  example  that  illustrate  this  case  is  the  partitioning  of  a  graph  into  P  segmented  portions. 
If  the  graph  is  fully  connected  the  number  of  connections  grow  quadratically  in  the  number  of 
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portions  P.  This  assumes  that  each  portion  is  allowed  to  communicate  with  all  other  portions; 
thus,  the  quadratic  growth.  This  implementation  is  clearly  neither  scalable  in  the  hardware  scal¬ 
ing  nor  in  the  communications  demand.  A  technique  to  reduce  this  interconnection  complexity 
can  be  introduced  that  reduces  the  complexity  to  P  (linear)  but  at  the  cost  of  incurring  addi¬ 
tional  memory  demands.  This  approach,  though  memory  intensive,  is  a  candidate  for  scalable 
implementations!  consideration. 

3.2  Status  of  Accomplishments 

1.  SYNET  is  a  large-scale  synaptic  network  simulator  that  has  been  developed  to  simulate 
the  dynamic  electro-ionic  behavior  of  the  neuronal  network  of  the  brain.  The  simulation 
uses  the  Hogdkm-Huxley  equations  that  models  the  coupling  of  electrical  stimulations  of  the 
neuron  with  the  ionic  currents  that  are  connected  through  corresponding  axons.  Aside  from 
the  consideration  of  implementing  numerical  techniques  to  decrease  the  simulation  time,  the 
simulation  is  represented  as  a  dynamic  graph  problem  with  N  nodes  (neuronal  cells)  and 
fully  connected  amongst  all  (N-l)  remaining  nodes.  Although  the  graph  node  connections 
(arcs)  are  known  statically,  the  use  of  these  connections  are  dynamically  resolved  based  on 
the  solutions  of  the  system  of  ODEs  that  model  the  firing  or  non-firing  of  a  neuronal  cell. 
The  graph  nature  of  this  problem  suggests  a  naive  algorithm  implementation  that  for  P 
distributed  processors  would  results  in  a  complexity  that  would  grow  quadratically  in  P.  At 
this  point  we  have  debugged  and  verified  the  correctness  of  a  linear  complexity  scheme  that 
uses  a  Master /Slave  parallel  structure  where  one  (or  a  subset  of)  processor  (s)  is  used  to 
receive  tokens  from  all  nodes  that  indicates  the  activated  connections  and  their  destination 
processors.  Finally,  in  this  way,  each  processor  receives  a  single  buffer  with  firing  information 
for  all  cells  handled  by  that  particular  processor.  Although  the  Master/Slave  paradigm 
address  the  problem  to  some  degree,  the  scalability  of  the  resulting  implementation  is 
limited  by  the  memory  available  to  the  Master  processor.  In  this  regard  we  have  designed 
a  multi-stage,  circular  shift  memory  configuration  that  can  be  implemented  on  clusters  of 
distributed  workstations.  We  are  now  in  the  process  of  analyzing  it  memory  complexity 
and  testing  an  actual  implementation. 

2.  A  comparison  of  message  passing  protocols  on  different  communications  architectures  was 
completed.  The  performance  of  TCP/IP  and  reliable  UDP  has  been  studied  on  both  a 
10  Megabit  ethernet  as  well  as  the  Fore- ATM  switch.  The  preliminary  findings  indicates 
that  TCP/IP  exhibits  better  performance  on  both  the  ethernet  and  the  ATM  over  UDP; 
however,  this  is  only  the  case  for  statically  assigned  TCP/IP  connections.  For  randomly 
dynamic  connection  reliable  UDP  is  still  better.  These  results  have  been  incorporated  in 
to  several  distributed  simulations  that  are  reported  here. 

Finally,  the  cost  of  synchronizing  processes  on  an  ATM  switch  is  being  designed  and  mea¬ 
sured.  This  examination  is  done  from  two  vantage  points.  The  first  takes  a  set  of  single 
processor  machines  and  connects  them  through  an  ATM  switch.  The  second  takes  a  set 
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of  small  multiprocessor  machines  and  examines  the  effect  of  using  their  internal  bus  for 
interprocessor  communication  and  the  ATM  switch  for  inter-node  communication. 

3.  The  parallel  spin-glass  simulated  annealing  algorithm  that  has  been  implementation  on  clus¬ 
ters  of  SPARC-20s  and  Ultra-2  workstations  has  given  rise  to  further  studies  of  a  modified 
cooling  scheme  central  to  the  simulated  annealing  technique.  The  proposed  cooling  scheme 
is  a  “split  phase”  technique  where  rapidly  converging  methods  are  replaced  by  less  rapidly 
converging  methods  at  the  tail  end  of  the  cooling  process.  This  procedure  increase  the 
probability  of  jumping  out  of  a  local  minimum  and  increase  the  sample  space.  This  results 
emerged  from  the  different  parallel  mapping  constructs  consider  for  the  implementation  of 
the  spin-glass  simulation. 

3.3  Personnel  Supported 

Robert  E.  Hiromoto  -  Faculty  investigator 
Lichun  Zhao  -  Research  Assistant 
Siva  Pochiraju  -  Research  Assistant 


3.4  Technical  Publications 
Journals  - 

1.  Robert  E.  Hiromoto  and  Joanne  Simmons,  “A  Split-Phase  Simulated  Annealing  Algorithm,” 
to  be  submitted  to  the  journal  of  Computational  Physics. 

2.  Georgios  Kousi  and  Robert  E.  Hiromoto  “Psi-net:  a  large-scale  parallel  synaptic  network 
simulator,”  to  appear  in  Parallel  Computing. 

3.  Robert  E.  Hiromoto,  Lichun  Zhao,  “Verifying  the  40  Hz  neuronal  synchronization  frequence 
using  Psi-net,”  submitted  to  the  Journal  of  Computational  Neuroscience. 

Conferences  - 

1.  Robert  E.  Hiromoto,  “Strictness  Analysis  for  Parallelism,”  Third  Scottish  Functional  Pro¬ 
gramming  Workshop,  Stirling,  Scotland,  22-24  Aug.  2001. 

2.  A.  Nasipuri,  J.  Mandava,  H.  Rao,  and  R.E.  Hiromoto,  “On-Demand  Routing  Using  Direc¬ 
tional  Antennas  in  Mobile  Ad  Hoc  Networks,”  Proceedings:  9th  International  Conference 
on  Computer  Communications  and  Networks,  Las  Vegas,  Nevada,  Oct.  16-18,  2000,  pp. 
535-541. 


3.  A.  Nasipuri,  S.  Ye,  J.  You,  and  R.E.  Hiromoto,  “A  MAC  Protocol  for  Mobil  Ad  Hoc 
Networks  Using  Directional  Antennas,  IEEE  Wireless  Communications  and  Networking 
Conference  2000,  Chicago,  IL.  Sept.  23-28. 

4.  R.E.  Hiromoto,  “Notes  on  Optimizing  Parallel  Strategies  in  GpH,”  Second  Scottish  Func¬ 
tional  Programming  Workshop,  St.  Andrews,  Scotland,  26-28  July  2000. 

3.5  Interactions 

Hiromoto  -  Invited  presentations: 

1.  “A  Simulated  Annealing  Split-Phase  Cooling  Scheme  for  Memory  Recall,”  Advanced  Re¬ 
search  Workshop  on  “High  Performance  Computing:  Technology  and  Applications,  Cetraro, 
Italy,  June  12-15,  2000. 

2.  “An  Information-Based  Complexity  Metric  for  the  Degree  of  Algorithmic  Parallelism,” 
Heriot-Watt  University,  Edinburgh,  Scotland,  July  15,  1999. 

3.  “A  Design  for  a  Large  Scale  Synaptic  Network  Simulator,”  Advanced  Research  Workshop  on 
“High  Performance  Computing:  Technology  and  Applications,  Cetraro,  Italy,  June  22-24, 
1998. 

4.  “A  Parallel  Elliptic  PDE  Solver  Using  a  Hybrid  Monte  Carlo  Boundary  Propagation  Method,” 
Tuskegee  University,  Tuskegee,  Alabama,  Jan.  22,  1998. 

5.  “Information-Based  Complexity  and  Parallelism,”  Stirling  University,  Stirling,  Scotland, 
Sept.  19, 1997. 

3.6  Transitions  -  None. 

3.7  Patent  Disclosures  -  None. 

3.8  Society  Service/Distinction 

Hiromoto: 

1.  Program  Co-Chair  (Advanced  Computing):  Intelligent  Data  Acquisition  and  Advanced 
Computing  Systems:  Technology  and  Applications,  Foros,  Crimea  Ukraine,  1-5  July  2001. 

2.  Steering  Committee  Member:  ParCo2001,  Naples,  Italy,  4-7  Sept.  2001. 

3.  Director:  Center  of  Advanced  Computing  and  Network  Research,  July  1999  -  Dec.  2001. 


4.  Selected  for  inclusion  in  the  5th  edition  of  Marquis  Who’s  Who  in  the  Science  and  Engi¬ 
neering. 

5.  Research  Advisory  Committee  for  the  Army  High  Performance  Computing  Research  Center 
(AHPCRC),  November  2000. 

6.  Program  Committee:  International  Conference  on  Supercomputing,  Santa  Fe,  New  Mexico, 
May  8-11,  2000. 

7.  Steering  Committee  Member:  ParCo99,  Delph,  Netherlands,  August  1999. 

8.  Program  Committee  Member:  Discrete  Mathematics  and  Theoretical  Computer  Science 
Conference,  Auckland,  New  Zealand,  January,  1999. 

9.  Program  Committee:  IPPS99  Workshop  on  High  Performance  Data  Mining,  San  Juan, 
Puerto  Rico,  April  6,  1999. 

10.  Invited  Public  Presentation:  President  Clinton’s  Commission  on  Critical  Infrastructure 
Protection,  May  13,  1997,  Houston,  Texas. 

11.  Regional  Editor  for  the  journal  Parallel  Computing. 

3.9  Honors/Awards 
Robert  Hiromoto: 

1.  NSF,  “Research  Experience  for  Minority  Students  in  High-Performance  Computing  and 
Communication,”  $1.5  million,  Sept.  2001  -  August  2006,  PI. 

2.  NSF,  “Computational  and  visualization  methods  for  large-scale  biophysical  neural  net¬ 
works,”  $100,000,  Jan.  1997  -  Dec.  1999,  PI. 


4  Designing  Versatile  High-Speed  LANs 

4.1  Objectives 

This  research  investigates  new  routing  and  switch  design  techniques  to  provide  predictable,  high 
bandwidth  for  variety  of  applications  such  as  video  streaming  services  and  large-scale  scientific 
computing  using  computer  clusters.  The  current  high-speed  local  and  wide  area  networks  (LANs 
and  WANs)  offer  impressive  data  rates  and  increasingly  use  the  Internet  Protocol  (IP).  Because 
IP  provides  packet  routing  with  no  quality  of  service  (QoS)  guarantees,  such  networks  though 
relatively  inexpensive  to  deploy,  do  not  meet  latency  constraints  for  applications  high  volume  of 
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communication.  Examples  of  such  applications  include  large-scale  scientific  computing  and  video 
streaming  services.  The  goal  is  to  design  new  LANs  that  interconnect  100s  of  PCs  or  workstations 
to  provide  dependable  performance  for  a  variety  of  data  communication  needs  even  under  high 
loads. 


4.2  Status  of  Accomplishments 

Path-Based  Multicasting  in  Multicomputer  Networks.  We  have  developed  a  path-based 
multicast  (one-to-many  communication)  method  and  evaluated  its  performance  with  many  other 
such  techniques  proposed  in  the  literature.  We  have  shown  that  this  technique  peforms  well  by 
using  fewer  resources  and  reducing  congestion  in  the  network.  This  work  has  been  published  in 
IEEE  Transactions  on  Parallel  and  Distributed  Systems. 


Low-cost  Adaptive  Switches.  We  have  developed  adaptive  message  routing  methods  that 
work  using  small  3X3  crossbars  are  building  blocks.  In  particular,  we  have  shown  that  using 
such  building  blocks,  adaptive  routing  switches  can  be  designed  for  networks  with  mesh  topology. 
This  result  is  different  from  those  known  in  literature,  which  assume  the  use  of  large  crossbars  for 
adaptive  routing.  This  work  has  been  published  in  IEEE  High  Performance  Computing  (HiPC) 
conference. 

4.3  Personnel  Supported 

Rajendra  V.  Boppana  -  participating  investigator 
Kui  Cai  -  Research  Assistant 


5  Memory  Access  Mechanism 

5.1  Objectives 

This  research  tackles  the  issue  of  providing  high  local  memory  utilization  at  low  cost.  We  study 
the  issue  in  the  context  of  Distributed  Shared  Memory  (DSM)  multiprocessors  and  of  high- 
performance  out-of-order  superscalar  microprocessors. 
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5.2  Status  of  Accomplishments 


Directory-based  cache  coherence  schemes  are  commonly  used  in  large-scale  distributed  shared- 
memory  multiprocessors.  However,  the  directory  schemes  proposed  so  far  have  two  major  draw¬ 
backs  when  compared  to  shared-bus  based  ’snooping’  schemes.  One  drawback  is  its  excessive 
cost  for  maintaining  a  directory,  in  terms  of  storage  requirement  and  access  time.  The  other 
drawback  is  its  difficulty  of  obtaining  system-wide  information  on  the  status  of  all  memory  blocks 
matching  to  a  given  address.  We  propose  a  new  address  mapping  to  remedy  the  two  drawbacks 
by  changing  our  viewpoint  on  memory  address  from  ’memory-centric’  to  ’cache-centric’.  The 
proposed  address  mapping  guarantees  minimum  storage  space  for  the  directory  and  provides  the 
system-wide  status  information  on  all  memory  blocks  as  easy  as  in  shaxed-bus  snooping  scheme. 
Based  on  the  new  directory  scheme  we  proposed,  we  devised  a  non-blocking  coherency  protocol, 
which  reduces  potentially  long  directory  processing  time  due  to  conflicts  at  the  directory. 

On-chip  Data  cache  memory  for  high-performance  processors  has  grown  tremendously  in  size  and 
in  complexity  to  a  point  that  it  consumes  more  transistors  than  processor  core  and  that  its  access 
takes  more  than  single  processor  cycles.  However,  faster  processor  cycle  due  to  reduced  feature 
size  demands  faster  data  access  latency,  and  higher  exploitation  of  ILP  in  wide-issue  superscalar 
processors  requires  higher  data  bandwidth.  On-chip  data  cache  memory  design  becomes  increas¬ 
ingly  inefficient  and  adds  to  the  hardware  complexity  significantly.  We  propose  a  way  of  splitting 
data  cache  memory  to  mitigate  the  problems.  By  splitting  cache  based  on  ’access  region’,  i.e.  an 
area  in  which  data  variable  is  allocated,  the  memory  stream  for  data  access  can  be  partitioned 
into  multiple  independent  streams  early  in  the  processor  pipeline.  By  feeding  each  stream  to  a 
separate  memory  access  queue  and  cache,  we  can  provide  the  high  bandwidth  and  fast  latency 
requirement  more  efficiently  with  less  hardware  complexity. 

5.3  Personnel  Supported 
Gyungho  Lee  -  Faculty  investigator 


5.4  Technical  Publications 
Journals  - 

1.  S.  Cho,  J.  Kong,  and  G.  Lee,  “Coherence  and  Replacement  Protocol  of  DICE  -  A  Bus- 
Based  COMA  Multiprocessor,”  Journal  of  Parallel  and  Distributed  Computing,  Vol.  57, 
pp.  14-32,  1999. 

2.  J.  Kong  and  Gyungho  Lee,  “Local  Memory  Binding  in  Distributed  Shared  Memory  Multi¬ 
processors”  ,  under  review  for  the  IEEE  Transactions  on  Parallel  and  Distributed  Systems 
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(currently  being  revised  for  second  review). 

3.  J.  Kong,  P.  Yew,  and  Gyungho  Lee,  “Minimizing  directory  size  in  large-scale  distributed 
shared  memory  multiprocessors”,  submitted  to  the  Journal  of  Parallel  and  Distributed 
Computing,  May  1999. 

4.  J.  Kong,  P.  Yew,  and  Gyungho  Lee,  “Non-blocking  directory  protocol  for  large-scale  dis¬ 
tributed  shared  memory  multiprocessors,”  submitted  to  the  IEEE  TPDS,  June  1999. 

Conferences  - 

1.  S.  Cho,  P.  Yew,  and  Gyungho  Lee,  “Decoupling  Local  Variable  Accesses  in  a  Wide-Issue 
Superscalar  Processor”,  Proc.  of  the  26th  International  Symposium  on  Computer  Archi¬ 
tecture,  Atlanta,  GA.,  May  1999. 

2.  S.  Cho,  P.  Yew,  and  Gyungho  Lee,  “Access  Region  Locality  for  high  bandwidth  processor 
memory  design” ,  to  be  presented  at  the  IEEE  Micro-32,  Haifa,  Israel,  Nov.  1999. 

5.5  Interactions 

5.6  Transitions  -  None. 

5.7  Patent  Disclosures  -  None 

5.8  Society  Service/Distinction 

Gyungho  Lee: 

•  Guest  editorial  introduction  to  the  special  issue  on  Interaction  between  Compilers  and 
Computer  Architecture,  ACM  SIGARCH  Computer  Architecture  News,  Vol.  27,  No.  1, 
March  1999  (with  P.  Yew). 

6  Metacomputing:  Application  Models,  Scheduling  Algorithms, 
and  System 

6.1  Objectives 

The  objective  of  this  research  project  is  to  examine  several  important  problems  in  the  area  of 

metacomputing:  the  seamless  integration  of  geographically-dispersed  computing  resources  con¬ 
nected  by  emerging  high-speed  wide-area  networks.  We  are  working  in  four  key  problem  areas 
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within  xnetacomputing:  scheduling  and  resource  management,  wide-area  I/O,  fault  tolerance,  and 
applications. 

6.2  Status  of  Accomplishments 

In  the  area  of  wide-area  I/O  we  have  successfully  applied  SFO  technology  to  a  distributed  gene 
sequence  application  running  across  the  Internet  and  the  vBNS,  and  performance  was  improved 
by  30%.  We  have  also  applied  the  concept  to  a  mpeg  player  accessing  Internet  mpeg  files,  and 
playtime  was  improved  by  20%  and  download  wait  was  reduced  by  50%. 

In  the  area  of  scheduling  and  resource  management  we  have  had  success  in  building  two  scheduling 
systems,  Prophet  and  Gallop.  Prophet  is  a  scheduler  for  SPMD  data  parallel  and  task  parallel 
pipeline  applications  in  shared,  heterogeneous,  workstation  networks.  It  has  been  successfully 
applied  to  numerous  parallel  scientific  applications  such  as  gene  sequence  comparison,  electro¬ 
magnetic  scattering  using  finite  elements,  and  image  processing  pipelines  achieving  good  results 
(20%  performance  improvement).  Gallop  is  a  scheduler  for  SPMD  applications  in  wide-area  net¬ 
works.  It  consults  a  set  of  network  sites  to  determine  where  the  application  will  be  most  efficiently 
executed.  The  sites  use  dynamically-constructed  performance  prediction  models  to  estimate  ap¬ 
plication  performance  using  the  available  resources  within  the  site.  We  have  constructed  an 
Internet-based  Gallop  testbed  that  has  demonstrated  the  benefits  of  exploiting  remote  resources 
to  achieve  better  performance  even  with  current  Internet  technology  (14%  improvement  over  local 
execution). 

6.3  Personnel  Supported 

Jon  B.  Weissman  -  Faculty  investigator 

6.4  Technical  Publications 
Journals  - 

1.  Jon  B.  Weissman,  Mike  Gingras,  and  Mahesh  Marina,  “Optimizing  Remote  File  Access 
for  Parallel  and  Distributed  Network  Applications,”  Journal  of  Parallel  and  Distributed 
Computing,  November  2001. 

2.  Jon  B.  Weissman,  “Prophet:  Automated  Scheduling  of  SPMD  Programs  in  Workstation 
Networks,”  Concurrency:  Practice  and  Experience,  Vol.  11,  No.  6,  May  1999. 

3.  Jon  B.  Weissman,  “  Gallop:  The  Benefits  of  Wide-Area  Computing  for  Parallel  Processing,” 
Journal  of  Parallel  and  Distributed  Computing,  Vol.  54,  No.  2,  pp.  183-205,  November 
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1998. 
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4.  Jon  B.  Weissman  and  Xin  Zhao,  “  Scheduling  Parallel  Applications  in  Distributed  Net¬ 
works,”  Journal  of  Cluster  Computing,  Vol.  1,  No.  1,  pp.  109-118,  May  1998,  invited 
paper. 


Conferences  - 

1.  Jon  B.  Weissman,  “Fault  Tolerant  Wide-Area  Parallel  Computing,”  IEEE  Workshop  on 
Fault-Tolerant  Parallel  and  Distributed  Systems,  International  Parallel  and  Distributed 
Processing  Symposium  IPDPS,  May  2000. 

2.  Jon  B.  Weissman,  “Scheduling  Multi-Component  Applications  in  Heterogeneous  Wide-area 
Networks,”  Heterogeneous  Computing  Workshop,  International  Parallel  and  Distributed 
Processing  Symposium  IPDPS,  May  2000. 

3.  Jon  B.  Weissman,  “Fault  Tolerant  Computing  on  the  Grid:  What  are  my  options?”  to 
appear  in  the  Eighth  IEEE  International  Symposium  on  High  Performance  Distributed 
Computing  (HDPC),  August  1999  (short  paper). 

4.  Mike  Gingras  and  Jon  B.  Weissman,  “Smart  Multimedia  File  Objects,”  to  appear  in  IEEE 
Workshop  on  Internet  Applications  (WIA),  July  1999. 

5.  Jon  B.  Weissman,  “Smart  File  Objects:  A  Remote  File  Access  Paradigm,”  Sixth  ACM 
Workshop  on  I/O  in  Parallel  and  Distributed  Systems  (IOPADS),  May  1999. 

6.  Jon  B.  Weissman,  “Metascheduling:  A  Scheduling  Model  for  Metacomputing  Systems 
,”  Proceedings  of  the  Seventh  IEEE  International  Symposium  on  High  Performance  Dis¬ 
tributed  Computing,  August  1998  (short  paper). 

7.  Jon  B.  Weissman,  “Gallop:  The  Benefits  of  Wide-Area  Computing  for  Parallel  Processing,” 
Journal  of  Parallel  and  Distributed  Computing,  Vol.  2,  No.  54,  November  1998. 

6.5  Interactions 

6.6  Transitions  -  None. 

6.7  Patent  Disclosures  -  None. 
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6.8  Honors/Awards 


e 


J.  Weissman: 

•  Texas  Advanced  Research  Program,  “Smart  File  Objects:  An  Application-directed  File 
Access  Paradigm,”  January  1998  -  December  1999. 

•  NSF  Faculty  Early  Career  Developement  (CAREER)  award,  “Resource  Management  for 
Parallel  and  Distributed  Systems,”  August  1996  -  July  2000. 


7  Optimal  Resource  Distribution  in  Multiprocessor  Networks 

7.1  Objectives 

This  project  is  focused  in  developing  advanced  resource  distribution  strategies  for  wormhole 
routed  networks  to  enhance  system  performance  in  both  packet  transmission  latency  and  sus¬ 
tained  data  traffic  rate. 

7.2  Status  of  Accomplishments 

Our  simulation  results  have  shown  the  proposed  techniques  outperform  traditional  ones  easily. 
Two  papers  on  these  results  have  been  presented  in  PDPTA’98  and  APADS’98.  Several  other 
related  papers  have  also  been  published. 

7.3  Personnel  Supported 

Wei-Ming  Lin  -  Faculty  investigator 
Chunhui  Zhao  -  MS  graduate  student 
An-Yi  Yang  -  MS  graduate  student 
Qing  Wan  -  MS  graduate  student 


7.4  Technical  Publications 
Journals  - 

1.  Wei-Ming  Lin  and  Xiaomei  Zhu,  “Allocation  Time-Based  Processor  Allocation  Scheme  for 
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2D  Mesh  Architecture,”  Journal  of  Information  Science  and  Engineering,  Vol.  16,  No.  2, 
March  2000,  pp.301-311,  2000. 

2.  Wei-Ming  Lin  and  Wei  Xie,  “Load  Skewing  Task  Assignment  to  Minimize  Communication 
Conflicts  on  Network  of  Workstations,”  Parallel  Computing  26  (2000),  pp.  179-197. 

3.  Wei-Ming  Lin  and  Qiuyan  Gu,  “Task  Scheduling  on  Bus-based  Networks  of  Workstations” , 
to  appear  on  Parallel  and  Distributed  Computing  (PDCP)  Journal,  Special  Issue:  Cluster 
Computing,  Vol.  2,  pp.  175-184,  June  1999. 

4.  Wei-Ming  Lin  and  Wei  Xie  “Minimizing  Communication  Conflicts  with  Load-Skewing  Task 
Assignment  Techniques  on  Networks  of  Workstations,”  to  appear  on  Informatica,  Vol.  23, 
No.  1,  1999. 

Conferences  - 

1.  Wei-Ming  Lin  and  Chi-Mo  Dai,  “An  Advanced  Deadlock  Recovery  Scheme  for  Wormhole- 
Routed  Networks,”  LASTED  International  Conference  on  Parallel  and  Distributed  Com¬ 
puting  and  Systems  (PDCS  2000),  Las  Vegas,  NV,  Nov.  2000. 

2.  Wei-Ming  Lin  and  Qing  Wan,  “Intelligent  Job  Scheduling  for  Mesh-Connected  Multi¬ 
computers  with  Feedback  Control,”  LASTED  International  Conference  on  Parallel  and  Dis¬ 
tributed  Computing  and  Systems  (PDCS  2000),  Las  Vegas,  NV,  Nov.  2000. 

3.  Wei-Ming  Lin  and  Ding  Wei,  Cost-Efficient  Branch  Prediction  Hardwares”,  13th  Interna¬ 
tional  Conference  on  Computers  and  Their  Applications  (CATA’98),  Honolulu,  Hawaii, 
March  1998. 

4.  Wei-Ming  Lin,  Qiuyan  Gu  and  Wei  Xie,  “DCP-NOW:  A  DCP-based  Task  Scheduling  Tech¬ 
nique  for  Networks  of  Workstations”,  1998  International  Conference  on  Parallel  and  Dis¬ 
tributed  Processing  Techniques  and  Applications  (PDPTA’98),  Las  Vegas,  NV,  July  1998, 

5.  Wei-Ming  Lin  and  Chunhui  Zhao,  “Wormhole  Routing  with  Priority-based  Channel  Allo¬ 
cation”,  1998  International  Conference  on  Parallel  and  Distributed  Processing  Techniques 
and  Applications  (PDPTA’98),  Las  Vegas,  NV,  July  1998, 

6.  Wei-Ming  Lin  and  Chunhui  Zhao,  “Look-Ahead  Traffic  Distribution  in  Wormhole-Routed 
Networks”,  Workshop  on  Advances  in  Parallel  and  Distributed  Systems  (APADS’98),  West 
Lafayette,  IN,  Oct.  1998, 
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8  Compilation  Techniques  for  Parallel  and  Distributed  Systems 

8.1  Objectives 

This  research  project  studies  program  analysis  techniques  for  compilation  and  execution  on  par¬ 
allel  and  distributed  systems.  The  development  of  fast  restructuring  compilers  for  the  automatic 
parallelization  of  sequential  programs  requires  efficient  dependence  analysis  techniques  to  obtain 
exact  dependence  information  and  to  rule  out  parallelism  restricting  false  dependences.  The  ob¬ 
jective  of  this  project  is  the  implementation  and  analysis  of  an  efficient  and  exact  dependence 
analyzer  for  parallelizing  compilers. 

8.2  Status  of  Accomplishments 

The  proposed  dependence  analysis  techniques  fall  into  two  categories,  either  efficient  and  ap¬ 
proximate  or  exact  and  exponential.  In  our  analytical  evaluation  we  proved  the  fundamental 
relationship  between  the  major  data  dependence  tests.  We  derived  the  conditions  under  which 
the  polynomial  tests  produce  exact  answers.  We  also  derived  the  conditions  under  which  the 
exponential  tests  become  efficient.  In  our  empirical  evaluation  we  compared  the  data  dependence 
tests  in  terms  of  accuracy  and  efficiency.  We  run  various  experiments  using  benchmarks  and  sci¬ 
entific  libraries.  We  concluded  that  most  cases  can  be  resolved  with  polynomial  time  techniques. 
In  the  extreme  cases  in  which  polynomial  tests  can  not  provide  an  exact  answer  an  exponential 
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test  can  be  invoked  as  a  back  up  test.  This  approach  significantly  reduces  the  cost  of  performing 
exact  dependence  analysis  in  practice. 
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